Contrastive Vision-Language Pre-training with Limited Resources

نویسندگان

چکیده

Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) have revealed the potential of aligning multi-modal representations with contrastive learning. However, these require a tremendous amount data computational resources billion-level web hundreds GPUs), which prevent researchers limited from reproduction further exploration. To this end, we propose stack novel methods, significantly cut down heavy resource dependency allow us to conduct representation alignment resources. Besides, provide reproducible baseline competitive results, namely ZeroVL, only 14M publicly accessible academic datasets 8 V100 GPUs. Additionally, collect 100M for pre-training, achieve comparable or superior results than state-of-the-art proving effectiveness our methods on large-scale data. We hope that work will useful points experience future research in vision-language pre-training. Code is available at https://github.com/zerovl/ZeroVL .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language identification with limited resources

Language identification is an important issue in many speech applications. We address this problem from the point of view of classification of sequences of phonemes, given the assumption that each language has its own phonotactic characteristics. In order to achieve this classification, we have to decode the speech utterances in terms of phonemes. The set of phonemes must be the same for all th...

متن کامل

Transliteration Generation and Mining with Limited Training Resources

We present DIRECTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DIRECTL in 2009. We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from sim...

متن کامل

TREQ-AL: A word alignment system with limited language resources

We provide a rather informal presentation of a prototype system for word alignment based on our previous translation equivalence approach, discuss the problems encountered in the shared-task on word-aligning of a parallel Romanian-English text, present the preliminary evaluation results and suggest further ways of improving the alignment accuracy.

متن کامل

Project selection with limited resources in data envelopment‎ ‎analysis‎

‎In this paper allocating a fixed resource for producing ‎finite projects in order to obtaining a desired level of‎ ‎efficiency will be discussed‎. ‎Note that it is assumed that a ‎vector of limited sources is at hand‎. ‎This vector of resources can‎ ‎be contained human resource‎, ‎budget‎, ‎equipment‎, ‎and facilities‎. ‎In ‎any firm there exist different suggestions from subunits for ‎running...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-20059-5_14